A post on how recreate the Our World In Data characteristic choropleth in R.
I really look up to Max Roser and the team at Our World in Data. They have some of the best short form articles condensing a wealth of academic literature to, in their words, “make progress against the world’s largest problems”.
The mission is summed up well in a lecture given at Stellenbosch University by Max Roser this year, included below.
In this tutorial I want to walk through recreating one of their classic chart types in R, the world map choropleth with an overlayed line graph for each coutnry. A typical example shown below.
There is a lot of information about the OWID grapher tool. You can have a look at their github repo and an older reddit AMA if you are interested. It’s a custom system that allows for systematic changes across their website, drawing on data from a central database.
What are the parts I want to recreate? The map has:
a base map, where the colour fill of each country indicates it’s position in a specific measure in a particular year.
a simple line chart that appears when you hover over a country, showing how the measure has changed within a country over time.
a clear legend
a note specifying the source of the data
I walk through creating each of these below.
The base map is sourced from the maps package. I add a three letter country code from the english name of the country using the countrycode package and filter out Antarctica, Greenland and the French Southern and Antarctic Lands.
# preamble
library(tidyverse)
library(sf)
theme_set(theme_light())
# load map
map <- st_as_sf(maps::map(database="world", plot = FALSE, fill = TRUE))
# create code to match coutnry to data with
library(countrycode)
map <- map %>%
mutate(code = countrycode(ID, "country.name", "iso3c"))
# remove clutter from map
country_to_remove <- c(
'Antarctica','Greenland', 'French Southern and Antarctic Lands'
)
map <- map %>%
filter(!ID %in% country_to_remove)
The base map is projected with the Web Mercator or WGS 84 projection, the same one Google Maps uses.
We read in the data as a CSV file, and tidy up the column names so that they are in snake case with the clean_names() command from the very helpful janitor package.
Next we remove the summary groups which we cannot show on the map, including the World Bank country income groupings.
# A tibble: 16 × 1
entity
<chr>
1 East Asia and Pacific
2 Europe and Central Asia
3 European Union
4 High income
5 Latin America and Caribbean
6 Low and middle income
7 Low income
8 Lower middle income
9 Middle East and North Africa
10 Middle income
11 North America
12 South Asia
13 Sub-Saharan Africa
14 Tuvalu
15 Upper middle income
16 World
So what we want to do is use the scale_color_viridis_c() palette. We have to map it to the min and max of the values in our dataset.
# A tibble: 1 × 2
min max
<dbl> <dbl>
1 3.5 68.5
library(viridisLite)
vir_10 <- viridis(n = 10)
smoking_hex <- scales::gradient_n_pal(
colours = vir_10,
values = seq(0, 50, by = 5)
)
scale_smoking <- function() {
scale_color_gradientn(
colours = vir_10,
values = seq(0, 50, by = 5) / 50,
limits = c(0, 68.5),
name = "value"
)
}
The line graph that appears when you hover over OWID map is very simple. It has just the starting value on the y-axis, and the first and last years on the x-axis, and a line that changes colour in accordance with the scale of the choropleth. The hover window which contains the graph also shows the country name, and the value of the measure in the most recent year.
To recreate it, we need store these four values, and draw the coloured line.
# Here the function to plot the line takes only one argument, `cd` the country code
plot_line <- function(cd) {
# get axis marks
label_y <- df %>%
filter(code == cd) %>%
mutate(
min_year = min(year),
max_year = max(year)
) %>%
filter(year == min(year))
# plot the line
df %>%
filter(code == cd) %>%
ggplot(aes(year, value)) +
geom_point(cex = 3) +
# mapping the colour of the line segment to the value is done here
geom_line(aes(colour = value), cex = 2, alpha = .7) +
# this scale is created above, with bounds appropriate to this data
scale_smoking() +
scale_y_continuous(
# specifying the break on the y-axis creates the axis text
breaks = c(label_y$value),
labels = scales::percent_format(scale = 1, accuracy = .1),
# the limits argument here ensures the y-axis starts at zero
limits = c(0, NA)
) +
scale_x_continuous(
# x-axis needs only two years, the first and last
breaks = c(label_y$min_year, label_y$max_year)) +
theme(
# removing the axis ticks and lines clears the graph of clutter
axis.ticks.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = "none",
text = element_text(size = 20)
) +
labs(
x = NULL,
y = NULL
)
}
# Test the function with India.
plot_line("IND")

library(gt)
library(gtExtras)
make_table <- function(cd) {
message("Making table for ", cd)
vars <- df %>%
filter(code == cd) %>%
filter(year == max(year))
plot <- plot_line(cd)
# the `ggplot_image` command outputs an image that can easily be put into a gt table when formatted as markdown
plot <- gt::ggplot_image(plot, height = px(250), aspect_ratio = 1.6)
tbl <- tibble(plot = plot, value = vars$value, context = glue::glue("in {vars$year}"))
gt(tbl) %>%
fmt_markdown(columns = c(plot)) %>%
fmt_percent(value, scale_values = F, decimals = 1) %>%
# the `merge_stack` command joins the value and the year in one cell
# the `smoking_hex` function we created above makes the text coloured appropriately
gt_merge_stack(col1 = value, col2 = context, palette = c(smoking_hex(tbl$value), "grey")) %>%
tab_style(
style = cell_text(size = "xx-large"),
locations = cells_body(
columns = c(value)
)
) %>%
tab_header(
# title table with coutnry name
title = md(glue::glue("**{vars$entity}**"))) %>%
tab_options(column_labels.hidden = TRUE) %>%
as_raw_html(inline_css = F)
}
# Test on South Africa
make_table("ZAF")
| South Africa | |
20.3%
in 2020 | |
Here we use the purrr::map command to make the table in raw HTML for each country and save it inside a tibble. The output shows an HTML list in the column called gt.
# A tibble: 162 × 2
code gt
<chr> <list>
1 AFG <html [1]>
2 ALB <html [1]>
3 DZA <html [1]>
4 AND <html [1]>
5 ARG <html [1]>
6 ARM <html [1]>
7 AUS <html [1]>
8 AUT <html [1]>
9 AZE <html [1]>
10 BHS <html [1]>
# … with 152 more rows
We thencreate a tibble called df_map that selects the most recent year for each country from the dataset and joins it to the map by the country code variable we created above. Finally we join this to the tibble of tables called gt_tables.
Now we are ready to create the interactive figure!
We begin by drawing a static map in grey, with data from the original map. Next we overlay the interactive choropleth. The grey static map will show through all the countries we don’t have data on in the dataset.
library(ggtext)
library(ggiraph)
g <- df_map %>%
ggplot(aes(geometry = geom)) +
geom_sf(data = map, fill = "grey80") +
geom_sf_interactive(aes(fill = value, tooltip = gt)) +
scale_fill_binned(type = "viridis", labels = scales::percent_format(scale = 1)) +
cowplot::theme_minimal_grid() +
theme(legend.position = "bottom") +
guides(fill = guide_colourbar(barwidth = 20, barheight = .5, title.position = "top", label = TRUE)) +
labs(
fill = "Share of adults who smoke, 2020",
caption = "Source: World Health Organization (via World Bank)"
) +
theme(
plot.background = element_rect(fill = "white", color = NA),
plot.title = element_text(hjust = 0.5, family = "marker", size = 50),
plot.subtitle = element_markdown(size = 20, family = "open", lineheight = 0.5),
plot.caption = element_markdown(size = 12, family = "open"),
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid.major = element_line(color = "grey80", size = 0.1),
legend.title.align = .5
)
g

# customizing the CSS makes the hover box easier to read.
tooltip_css <- "background-color:gray;color:white;padding:10px;border-radius:5px;text-align:center;"
ggiraph(
ggobj = g,
options = list(
opts_tooltip(css = tooltip_css),
opts_sizing(width = 1)
)
)